Task1.Data structure.

doi:10.1371/journal.pone.0119491 There is eight groups of mice (7–10 per group): trisomic and controls (Genotype variable), CS mice injected with saline or with memantine (Treatment variable) and SC mice injected with saline or with memantine, as described.CS - this is the context-shock (CS) group, SC - the shock-context group (Behavior variable).

“Context fear conditioning (CFC) was performed as described [16,39,42]. Briefly, mice were placed in a novel cage (Med Associates, St. Albans, VT, Modular Mouse Test Chamber), allowed to explore for three minutes and then given an electric shock (2 s, 0.7 mA, constant electric current). These mice are the context-shock (CS) group and learn to associate the context with the aversive stimulus. These mice are the shock-context (SC) group and do not acquire conditioned fear.”

Data loading:

data_mice <- read.csv('Data_Cortex_Nuclear.csv')
str(data_mice)
## 'data.frame':    1080 obs. of  82 variables:
##  $ MouseID        : Factor w/ 1080 levels "18899_1","18899_10",..: 46 53 54 55 56 57 58 59 60 47 ...
##  $ DYRK1A_N       : num  0.504 0.515 0.509 0.442 0.435 ...
##  $ ITSN1_N        : num  0.747 0.689 0.73 0.617 0.617 ...
##  $ BDNF_N         : num  0.43 0.412 0.418 0.359 0.359 ...
##  $ NR1_N          : num  2.82 2.79 2.69 2.47 2.37 ...
##  $ NR2A_N         : num  5.99 5.69 5.62 4.98 4.72 ...
##  $ pAKT_N         : num  0.219 0.212 0.209 0.223 0.213 ...
##  $ pBRAF_N        : num  0.178 0.173 0.176 0.176 0.174 ...
##  $ pCAMKII_N      : num  2.37 2.29 2.28 2.15 2.13 ...
##  $ pCREB_N        : num  0.232 0.227 0.23 0.207 0.192 ...
##  $ pELK_N         : num  1.75 1.6 1.56 1.6 1.5 ...
##  $ pERK_N         : num  0.688 0.695 0.677 0.583 0.551 ...
##  $ pJNK_N         : num  0.306 0.299 0.291 0.297 0.287 ...
##  $ PKCA_N         : num  0.403 0.386 0.381 0.377 0.364 ...
##  $ pMEK_N         : num  0.297 0.281 0.282 0.314 0.278 ...
##  $ pNR1_N         : num  1.022 0.957 1.004 0.875 0.865 ...
##  $ pNR2A_N        : num  0.606 0.588 0.602 0.52 0.508 ...
##  $ pNR2B_N        : num  1.88 1.73 1.73 1.57 1.48 ...
##  $ pPKCAB_N       : num  2.31 2.04 2.02 2.13 2.01 ...
##  $ pRSK_N         : num  0.442 0.445 0.468 0.478 0.483 ...
##  $ AKT_N          : num  0.859 0.835 0.814 0.728 0.688 ...
##  $ BRAF_N         : num  0.416 0.4 0.4 0.386 0.368 ...
##  $ CAMKII_N       : num  0.37 0.356 0.368 0.363 0.355 ...
##  $ CREB_N         : num  0.179 0.174 0.174 0.179 0.175 ...
##  $ ELK_N          : num  1.87 1.76 1.77 1.29 1.32 ...
##  $ ERK_N          : num  3.69 3.49 3.57 2.97 2.9 ...
##  $ GSK3B_N        : num  1.54 1.51 1.5 1.42 1.36 ...
##  $ JNK_N          : num  0.265 0.256 0.26 0.26 0.251 ...
##  $ MEK_N          : num  0.32 0.304 0.312 0.279 0.274 ...
##  $ TRKA_N         : num  0.814 0.781 0.785 0.734 0.703 ...
##  $ RSK_N          : num  0.166 0.157 0.161 0.162 0.155 ...
##  $ APP_N          : num  0.454 0.431 0.423 0.411 0.399 ...
##  $ Bcatenin_N     : num  3.04 2.92 2.94 2.5 2.46 ...
##  $ SOD1_N         : num  0.37 0.342 0.344 0.345 0.329 ...
##  $ MTOR_N         : num  0.459 0.424 0.425 0.429 0.409 ...
##  $ P38_N          : num  0.335 0.325 0.325 0.33 0.313 ...
##  $ pMTOR_N        : num  0.825 0.762 0.757 0.747 0.692 ...
##  $ DSCR1_N        : num  0.577 0.545 0.544 0.547 0.537 ...
##  $ AMPKA_N        : num  0.448 0.421 0.405 0.387 0.361 ...
##  $ NR2B_N         : num  0.586 0.545 0.553 0.548 0.513 ...
##  $ pNUMB_N        : num  0.395 0.368 0.364 0.367 0.352 ...
##  $ RAPTOR_N       : num  0.34 0.322 0.313 0.328 0.312 ...
##  $ TIAM1_N        : num  0.483 0.455 0.447 0.443 0.419 ...
##  $ pP70S6_N       : num  0.294 0.276 0.257 0.399 0.393 ...
##  $ NUMB_N         : num  0.182 0.182 0.184 0.162 0.16 ...
##  $ P70S6_N        : num  0.843 0.848 0.856 0.76 0.768 ...
##  $ pGSK3B_N       : num  0.193 0.195 0.201 0.184 0.186 ...
##  $ pPKCG_N        : num  1.44 1.44 1.52 1.61 1.65 ...
##  $ CDK5_N         : num  0.295 0.294 0.302 0.296 0.297 ...
##  $ S6_N           : num  0.355 0.355 0.386 0.291 0.309 ...
##  $ ADARB1_N       : num  1.34 1.31 1.28 1.2 1.21 ...
##  $ AcetylH3K9_N   : num  0.17 0.171 0.185 0.16 0.165 ...
##  $ RRP1_N         : num  0.159 0.158 0.149 0.166 0.161 ...
##  $ BAX_N          : num  0.189 0.185 0.191 0.185 0.188 ...
##  $ ARC_N          : num  0.106 0.107 0.108 0.103 0.105 ...
##  $ ERBB4_N        : num  0.145 0.15 0.145 0.141 0.142 ...
##  $ nNOS_N         : num  0.177 0.178 0.176 0.164 0.168 ...
##  $ Tau_N          : num  0.125 0.134 0.133 0.123 0.137 ...
##  $ GFAP_N         : num  0.115 0.118 0.118 0.117 0.116 ...
##  $ GluR3_N        : num  0.228 0.238 0.245 0.235 0.256 ...
##  $ GluR4_N        : num  0.143 0.142 0.142 0.145 0.141 ...
##  $ IL1B_N         : num  0.431 0.457 0.51 0.431 0.481 ...
##  $ P3525_N        : num  0.248 0.258 0.255 0.251 0.252 ...
##  $ pCASP9_N       : num  1.6 1.67 1.66 1.48 1.53 ...
##  $ PSD95_N        : num  2.01 2 2.02 1.96 2.01 ...
##  $ SNCA_N         : num  0.108 0.11 0.108 0.12 0.12 ...
##  $ Ubiquitin_N    : num  1.045 1.01 0.997 0.99 0.998 ...
##  $ pGSK3B_Tyr216_N: num  0.832 0.849 0.847 0.833 0.879 ...
##  $ SHH_N          : num  0.189 0.2 0.194 0.192 0.206 ...
##  $ BAD_N          : num  0.123 0.117 0.119 0.133 0.13 ...
##  $ BCL2_N         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pS6_N          : num  0.106 0.107 0.108 0.103 0.105 ...
##  $ pCFOS_N        : num  0.108 0.104 0.106 0.111 0.111 ...
##  $ SYP_N          : num  0.427 0.442 0.436 0.392 0.434 ...
##  $ H3AcK18_N      : num  0.115 0.112 0.112 0.13 0.118 ...
##  $ EGR1_N         : num  0.132 0.135 0.133 0.147 0.14 ...
##  $ H3MeK4_N       : num  0.128 0.131 0.127 0.147 0.148 ...
##  $ CaNA_N         : num  1.68 1.74 1.93 1.7 1.84 ...
##  $ Genotype       : Factor w/ 2 levels "Control","Ts65Dn": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment      : Factor w/ 2 levels "Memantine","Saline": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Behavior       : Factor w/ 2 levels "C/S","S/C": 1 1 1 1 1 1 1 1 1 1 ...
##  $ class          : Factor w/ 8 levels "c-CS-m","c-CS-s",..: 1 1 1 1 1 1 1 1 1 1 ...

NA values:

colSums(is.na(data_mice))
##         MouseID        DYRK1A_N         ITSN1_N          BDNF_N           NR1_N 
##               0               3               3               3               3 
##          NR2A_N          pAKT_N         pBRAF_N       pCAMKII_N         pCREB_N 
##               3               3               3               3               3 
##          pELK_N          pERK_N          pJNK_N          PKCA_N          pMEK_N 
##               3               3               3               3               3 
##          pNR1_N         pNR2A_N         pNR2B_N        pPKCAB_N          pRSK_N 
##               3               3               3               3               3 
##           AKT_N          BRAF_N        CAMKII_N          CREB_N           ELK_N 
##               3               3               3               3              18 
##           ERK_N         GSK3B_N           JNK_N           MEK_N          TRKA_N 
##               3               3               3               7               3 
##           RSK_N           APP_N      Bcatenin_N          SOD1_N          MTOR_N 
##               3               3              18               3               3 
##           P38_N         pMTOR_N         DSCR1_N         AMPKA_N          NR2B_N 
##               3               3               3               3               3 
##         pNUMB_N        RAPTOR_N         TIAM1_N        pP70S6_N          NUMB_N 
##               3               3               3               3               0 
##         P70S6_N        pGSK3B_N         pPKCG_N          CDK5_N            S6_N 
##               0               0               0               0               0 
##        ADARB1_N    AcetylH3K9_N          RRP1_N           BAX_N           ARC_N 
##               0               0               0               0               0 
##         ERBB4_N          nNOS_N           Tau_N          GFAP_N         GluR3_N 
##               0               0               0               0               0 
##         GluR4_N          IL1B_N         P3525_N        pCASP9_N         PSD95_N 
##               0               0               0               0               0 
##          SNCA_N     Ubiquitin_N pGSK3B_Tyr216_N           SHH_N           BAD_N 
##               0               0               0               0             213 
##          BCL2_N           pS6_N         pCFOS_N           SYP_N       H3AcK18_N 
##             285               0              75               0             180 
##          EGR1_N        H3MeK4_N          CaNA_N        Genotype       Treatment 
##             210             270               0               0               0 
##        Behavior           class 
##               0               0

Number of observations by groups:

data_mice$class <- as.factor(data_mice$class)
#Число переменных в каждой группе, выделенной в зависимости от class
summary(data_mice$class) #или table(data_mice$class)
## c-CS-m c-CS-s c-SC-m c-SC-s t-CS-m t-CS-s t-SC-m t-SC-s 
##    150    135    150    135    135    105    135    135

There is a small variation in the number of observations in groups. The groups are not balanced.

Task2. Dependency of BDNF_N variable on a class.

It can be seen that the number of variables in the groups is not the same.

summary(data_mice$BDNF_N ~ data_mice$class)
## data_mice$BDNF_N     N= 1077 , 3 Missing 
## 
## +---------------+------+----+----------------+
## |               |      |N   |data_mice$BDNF_N|
## +---------------+------+----+----------------+
## |data_mice$class|c-CS-m| 150|0.3392174       |
## |               |c-CS-s| 135|0.3423153       |
## |               |c-SC-m| 150|0.2909458       |
## |               |c-SC-s| 135|0.3133925       |
## |               |t-CS-m| 135|0.3127322       |
## |               |t-CS-s| 105|0.3054604       |
## |               |t-SC-m| 135|0.3210633       |
## |               |t-SC-s| 132|0.3255864       |
## +---------------+------+----+----------------+
## |Overall        |      |1077|0.3190884       |
## +---------------+------+----+----------------+

There are three NA values in the BDNF_N column (in the t-CS-s subgroup).

theme_set(theme_bw())
ggplot(data_mice, aes(class, BDNF_N, color = class)) + stat_summary(fun.data = "mean_cl_normal") + ggtitle(label = "BDNF_N")

Picture 1 Graph of BDNF_N dependence on the “class” variable.

Let’s check if there are differences in the level of BDNF_N production depending on the class in the experiment using Anova.

We can use Anova if: -Normal distribution of residuals -Сonstancy of residuals variance -Observation independence (no multicollinearity) -No influential observations

Linear model:

mod_mice <- lm(BDNF_N ~ Genotype*Treatment*Behavior, data = data_mice, contrasts = list(Genotype = contr.sum, Treatment = contr.sum, Behavior = contr.sum))
summary(mod_mice)
## 
## Call:
## lm(formula = BDNF_N ~ Genotype * Treatment * Behavior, data = data_mice, 
##     contrasts = list(Genotype = contr.sum, Treatment = contr.sum, 
##         Behavior = contr.sum))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.175764 -0.028777 -0.001609  0.028701  0.159388 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     0.3188392  0.0014321 222.636  < 2e-16 ***
## Genotype1                       0.0026286  0.0014321   1.835  0.06672 .  
## Treatment1                     -0.0028495  0.0014321  -1.990  0.04688 *  
## Behavior1                       0.0060922  0.0014321   4.254 2.28e-05 ***
## Genotype1:Treatment1           -0.0035367  0.0014321  -2.470  0.01368 *  
## Genotype1:Behavior1             0.0132064  0.0014321   9.222  < 2e-16 ***
## Treatment1:Behavior1            0.0038930  0.0014321   2.718  0.00667 ** 
## Genotype1:Treatment1:Behavior1  0.0009442  0.0014321   0.659  0.50982    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04675 on 1069 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.1097, Adjusted R-squared:  0.1039 
## F-statistic: 18.82 on 7 and 1069 DF,  p-value: < 2.2e-16

–Observation independence (no multicollinearity): Let’s calculate the value Variance inflation factor (VIF).

vif(mod_mice)
##                    Genotype                   Treatment 
##                    1.007281                    1.007281 
##                    Behavior          Genotype:Treatment 
##                    1.010105                    1.010732 
##           Genotype:Behavior          Treatment:Behavior 
##                    1.010105                    1.010105 
## Genotype:Treatment:Behavior 
##                    1.010105

There is no multicollinearity. All VIF values less than 2.

–No influential observations (Picture 1)

theme_set(theme_bw())
mod_diag <- fortify(mod_mice)
ggplot(mod_diag, aes(x = 1:nrow(mod_diag), y = .cooksd)) + geom_bar(stat = "identity") + ggtitle("Graph of Cook's distances")

Picture 1 Graph of Cook’s distances

None of the values exceed the conditional threshold of 2 units. No influential observations.

–Normal distribution of residues (Picture 2)

qqPlot(mod_diag$.stdresid, main = 'Graph Q-Q')

## [1] 182 918

Picture 2 Q-Q graph. This is normal distribution.

ANOVA

mice_anova <- Anova(mod_mice, type = "III")
mice_anova
## Anova Table (Type III tests)
## 
## Response: BDNF_N
##                              Sum Sq   Df    F value    Pr(>F)    
## (Intercept)                 108.323    1 49566.6260 < 2.2e-16 ***
## Genotype                      0.007    1     3.3689  0.066715 .  
## Treatment                     0.009    1     3.9590  0.046877 *  
## Behavior                      0.040    1    18.0964 2.284e-05 ***
## Genotype:Treatment            0.013    1     6.0987  0.013684 *  
## Genotype:Behavior             0.186    1    85.0389 < 2.2e-16 ***
## Treatment:Behavior            0.016    1     7.3894  0.006667 ** 
## Genotype:Treatment:Behavior   0.001    1     0.4347  0.509820    
## Residuals                     2.336 1069                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can notice that BDNF_N significantly depends on Treatment and Behavior. There is also a significant interaction of factors. Let us apply post hoc criteria to identify which groups there are differences in the average protein level.

data_mice$inter <- interaction(data_mice$Genotype, data_mice$Treatment, data_mice$Behavior)

mod_inter <- lm(BDNF_N ~ -1 + inter, data = data_mice)
data_tukey <- glht(mod_inter, linfct = mcp(inter= 'Tukey') )
summary(data_tukey)
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = BDNF_N ~ -1 + inter, data = data_mice)
## 
## Linear Hypotheses:
##                                                      Estimate Std. Error
## Ts65Dn.Memantine.C/S - Control.Memantine.C/S == 0  -0.0264852  0.0055459
## Control.Saline.C/S - Control.Memantine.C/S == 0     0.0030979  0.0055459
## Ts65Dn.Saline.C/S - Control.Memantine.C/S == 0     -0.0337570  0.0059483
## Control.Memantine.S/C - Control.Memantine.C/S == 0 -0.0482717  0.0053980
## Ts65Dn.Memantine.S/C - Control.Memantine.C/S == 0  -0.0181541  0.0055459
## Control.Saline.S/C - Control.Memantine.C/S == 0    -0.0258249  0.0055459
## Ts65Dn.Saline.S/C - Control.Memantine.C/S == 0     -0.0136310  0.0055790
## Control.Saline.C/S - Ts65Dn.Memantine.C/S == 0      0.0295831  0.0056900
## Ts65Dn.Saline.C/S - Ts65Dn.Memantine.C/S == 0      -0.0072718  0.0060829
## Control.Memantine.S/C - Ts65Dn.Memantine.C/S == 0  -0.0217865  0.0055459
## Ts65Dn.Memantine.S/C - Ts65Dn.Memantine.C/S == 0    0.0083311  0.0056900
## Control.Saline.S/C - Ts65Dn.Memantine.C/S == 0      0.0006603  0.0056900
## Ts65Dn.Saline.S/C - Ts65Dn.Memantine.C/S == 0       0.0128542  0.0057223
## Ts65Dn.Saline.C/S - Control.Saline.C/S == 0        -0.0368549  0.0060829
## Control.Memantine.S/C - Control.Saline.C/S == 0    -0.0513696  0.0055459
## Ts65Dn.Memantine.S/C - Control.Saline.C/S == 0     -0.0212520  0.0056900
## Control.Saline.S/C - Control.Saline.C/S == 0       -0.0289228  0.0056900
## Ts65Dn.Saline.S/C - Control.Saline.C/S == 0        -0.0167289  0.0057223
## Control.Memantine.S/C - Ts65Dn.Saline.C/S == 0     -0.0145147  0.0059483
## Ts65Dn.Memantine.S/C - Ts65Dn.Saline.C/S == 0       0.0156029  0.0060829
## Control.Saline.S/C - Ts65Dn.Saline.C/S == 0         0.0079321  0.0060829
## Ts65Dn.Saline.S/C - Ts65Dn.Saline.C/S == 0          0.0201260  0.0061130
## Ts65Dn.Memantine.S/C - Control.Memantine.S/C == 0   0.0301176  0.0055459
## Control.Saline.S/C - Control.Memantine.S/C == 0     0.0224468  0.0055459
## Ts65Dn.Saline.S/C - Control.Memantine.S/C == 0      0.0346406  0.0055790
## Control.Saline.S/C - Ts65Dn.Memantine.S/C == 0     -0.0076708  0.0056900
## Ts65Dn.Saline.S/C - Ts65Dn.Memantine.S/C == 0       0.0045231  0.0057223
## Ts65Dn.Saline.S/C - Control.Saline.S/C == 0         0.0121939  0.0057223
##                                                    t value Pr(>|t|)    
## Ts65Dn.Memantine.C/S - Control.Memantine.C/S == 0   -4.776  < 0.001 ***
## Control.Saline.C/S - Control.Memantine.C/S == 0      0.559  0.99930    
## Ts65Dn.Saline.C/S - Control.Memantine.C/S == 0      -5.675  < 0.001 ***
## Control.Memantine.S/C - Control.Memantine.C/S == 0  -8.942  < 0.001 ***
## Ts65Dn.Memantine.S/C - Control.Memantine.C/S == 0   -3.273  0.02416 *  
## Control.Saline.S/C - Control.Memantine.C/S == 0     -4.657  < 0.001 ***
## Ts65Dn.Saline.S/C - Control.Memantine.C/S == 0      -2.443  0.22127    
## Control.Saline.C/S - Ts65Dn.Memantine.C/S == 0       5.199  < 0.001 ***
## Ts65Dn.Saline.C/S - Ts65Dn.Memantine.C/S == 0       -1.195  0.93313    
## Control.Memantine.S/C - Ts65Dn.Memantine.C/S == 0   -3.928  0.00245 ** 
## Ts65Dn.Memantine.S/C - Ts65Dn.Memantine.C/S == 0     1.464  0.82608    
## Control.Saline.S/C - Ts65Dn.Memantine.C/S == 0       0.116  1.00000    
## Ts65Dn.Saline.S/C - Ts65Dn.Memantine.C/S == 0        2.246  0.32444    
## Ts65Dn.Saline.C/S - Control.Saline.C/S == 0         -6.059  < 0.001 ***
## Control.Memantine.S/C - Control.Saline.C/S == 0     -9.263  < 0.001 ***
## Ts65Dn.Memantine.S/C - Control.Saline.C/S == 0      -3.735  0.00468 ** 
## Control.Saline.S/C - Control.Saline.C/S == 0        -5.083  < 0.001 ***
## Ts65Dn.Saline.S/C - Control.Saline.C/S == 0         -2.923  0.06852 .  
## Control.Memantine.S/C - Ts65Dn.Saline.C/S == 0      -2.440  0.22320    
## Ts65Dn.Memantine.S/C - Ts65Dn.Saline.C/S == 0        2.565  0.16959    
## Control.Saline.S/C - Ts65Dn.Saline.C/S == 0          1.304  0.89727    
## Ts65Dn.Saline.S/C - Ts65Dn.Saline.C/S == 0           3.292  0.02274 *  
## Ts65Dn.Memantine.S/C - Control.Memantine.S/C == 0    5.431  < 0.001 ***
## Control.Saline.S/C - Control.Memantine.S/C == 0      4.047  0.00156 ** 
## Ts65Dn.Saline.S/C - Control.Memantine.S/C == 0       6.209  < 0.001 ***
## Control.Saline.S/C - Ts65Dn.Memantine.S/C == 0      -1.348  0.87979    
## Ts65Dn.Saline.S/C - Ts65Dn.Memantine.S/C == 0        0.790  0.99360    
## Ts65Dn.Saline.S/C - Control.Saline.S/C == 0          2.131  0.39477    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Visualization of post hoc analysis:

theme_set(theme_bw())
MyData <- expand.grid(Genotype = levels(data_mice$Genotype), Treatment =levels(data_mice$Treatment), Behavior = levels(data_mice$Behavior))

MyData <- data.frame(
  MyData,
  predict(mod_mice, newdata = MyData, interval = 'confidence')
)

pos <- position_dodge(width = 0.2)
gg_linep <- ggplot(data = MyData, aes(x = Treatment, y = fit,
                                      ymin = lwr, ymax = upr, colour = Genotype)) +
  geom_point(position = pos) +
  geom_errorbar(position = pos, width =0.2 )
gg_linep

Picture 3 Influence of Treatment to BDNF_N level depending on the Genotype.

theme_set(theme_bw())
MyData <- expand.grid(Genotype = levels(data_mice$Genotype), Treatment =levels(data_mice$Treatment), Behavior = levels(data_mice$Behavior))

MyData <- data.frame(
  MyData,
  predict(mod_mice, newdata = MyData, interval = 'confidence')
)

pos <- position_dodge(width = 0.2)
gg_linep <- ggplot(data = MyData, aes(x = Behavior, y = fit,
                                      ymin = lwr, ymax = upr, colour = Genotype)) +
  geom_point(position = pos) +
  geom_errorbar(position = pos, width =0.2 )
gg_linep

Picture 4 Influence of Behavior to BDNF_N level depending on the Genotype.

theme_set(theme_bw())
MyData <- expand.grid(Genotype = levels(data_mice$Genotype), Treatment =levels(data_mice$Treatment), Behavior = levels(data_mice$Behavior))

MyData <- data.frame(
  MyData,
  predict(mod_mice, newdata = MyData, interval = 'confidence')
)

pos <- position_dodge(width = 0.2)
gg_linep <- ggplot(data = MyData, aes(x = Behavior, y = fit,
                                      ymin = lwr, ymax = upr, colour = Treatment)) +
  geom_point(position = pos) +
  geom_errorbar(position = pos, width =0.2 )
gg_linep

Picture 5 Influence of Behavior to BDNF_N level depending on the Treatment.

Conclusions: 1.Ts65Dn.Memantine.C/S - Control.Memantine.C/S == 0 Memantine has different effects on BDNF_N of healthy mice and mice with Down syndrome. 2.Control.Memantine.S/C - Control.Memantine.C/S == 0 Context fear conditioning has an effect on BDNF_N even in healthy mice. 3.Ts65Dn.Saline.C/S - Control.Saline.C/S == 0 BDNF_N are really significantly different in healthy mice and mice with Down syndrome. 4.Ts65Dn.Saline.S/C - Ts65Dn.Saline.C/S == 0 Context fear conditioning has an effect on BDNF_N in mice with Down syndrome. 5.Ts65Dn.Memantine.S/C - Control.Memantine.S/C == 0 Memantine affects BDNF_N protein regardless of сontext fear conditioning. 6.Control.Saline.S/C - Control.Memantine.S/C == 0 The choice of drug affects on BDNF_N even in healthy mice.

Task3. Linear model of the ERBB4_N protein.

Build a linear model capable of predicting the level of production of the ERBB4_N protein based on data on other proteins in the experiment

Leave only numeric variables.

data_mice_num <- select_if(data_mice, is.numeric)
fit <- lm(ERBB4_N ~ ., data = data_mice_num)
summary(fit)
## 
## Call:
## lm(formula = ERBB4_N ~ ., data = data_mice_num)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.023037 -0.003370 -0.000127  0.003071  0.031955 
## 
## Coefficients: (1 not defined because of singularities)
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      2.566e-02  8.245e-03   3.112 0.001970 ** 
## DYRK1A_N        -6.824e-03  7.475e-03  -0.913 0.361710    
## ITSN1_N          8.127e-03  1.058e-02   0.768 0.442776    
## BDNF_N           3.833e-02  1.919e-02   1.997 0.046378 *  
## NR1_N           -1.409e-02  6.685e-03  -2.108 0.035544 *  
## NR2A_N          -1.049e-04  1.361e-03  -0.077 0.938610    
## pAKT_N           7.961e-02  3.044e-02   2.615 0.009203 ** 
## pBRAF_N         -4.103e-02  3.065e-02  -1.339 0.181356    
## pCAMKII_N       -1.451e-05  7.954e-04  -0.018 0.985451    
## pCREB_N         -6.015e-02  2.910e-02  -2.067 0.039290 *  
## pELK_N           4.731e-05  1.030e-03   0.046 0.963391    
## pERK_N          -1.987e-02  7.887e-03  -2.519 0.012109 *  
## pJNK_N          -6.035e-02  2.538e-02  -2.378 0.017814 *  
## PKCA_N           8.377e-03  2.736e-02   0.306 0.759625    
## pMEK_N           8.498e-03  2.706e-02   0.314 0.753676    
## pNR1_N          -2.566e-02  1.334e-02  -1.924 0.054968 .  
## pNR2A_N          1.004e-02  7.239e-03   1.388 0.165927    
## pNR2B_N          1.928e-02  6.659e-03   2.896 0.003959 ** 
## pPKCAB_N         5.727e-03  2.773e-03   2.065 0.039457 *  
## pRSK_N           1.001e-02  1.427e-02   0.701 0.483573    
## AKT_N           -6.738e-04  8.183e-03  -0.082 0.934413    
## BRAF_N           1.550e-02  1.213e-02   1.278 0.201926    
## CAMKII_N        -3.429e-03  2.011e-02  -0.170 0.864700    
## CREB_N          -1.450e-02  3.091e-02  -0.469 0.639135    
## ELK_N            3.510e-03  3.719e-03   0.944 0.345811    
## ERK_N            4.660e-03  2.804e-03   1.662 0.097208 .  
## GSK3B_N         -5.840e-03  6.824e-03  -0.856 0.392588    
## JNK_N           -1.188e-02  3.378e-02  -0.352 0.725317    
## MEK_N            8.785e-03  2.197e-02   0.400 0.689510    
## TRKA_N           5.722e-03  1.646e-02   0.348 0.728257    
## RSK_N           -2.299e-02  3.915e-02  -0.587 0.557214    
## APP_N           -5.345e-03  1.825e-02  -0.293 0.769751    
## Bcatenin_N       1.107e-03  5.298e-03   0.209 0.834546    
## SOD1_N          -6.241e-03  3.985e-03  -1.566 0.118018    
## MTOR_N           4.161e-02  1.651e-02   2.520 0.012048 *  
## P38_N           -1.619e-02  1.348e-02  -1.201 0.230267    
## pMTOR_N         -9.530e-03  7.667e-03  -1.243 0.214475    
## DSCR1_N         -6.849e-03  1.023e-02  -0.670 0.503374    
## AMPKA_N          2.522e-02  2.294e-02   1.099 0.272263    
## NR2B_N           7.391e-03  9.146e-03   0.808 0.419449    
## pNUMB_N         -6.467e-03  2.006e-02  -0.322 0.747326    
## RAPTOR_N         4.750e-02  2.465e-02   1.927 0.054629 .  
## TIAM1_N         -3.284e-02  1.956e-02  -1.679 0.093892 .  
## pP70S6_N         2.739e-03  5.307e-03   0.516 0.606022    
## NUMB_N          -3.304e-02  3.784e-02  -0.873 0.383042    
## P70S6_N         -4.357e-03  5.119e-03  -0.851 0.395118    
## pGSK3B_N         1.291e-01  4.063e-02   3.179 0.001577 ** 
## pPKCG_N         -7.636e-03  1.962e-03  -3.893 0.000113 ***
## CDK5_N           7.111e-04  9.807e-03   0.073 0.942225    
## S6_N             1.566e-02  7.647e-03   2.049 0.041059 *  
## ADARB1_N        -2.609e-03  2.040e-03  -1.279 0.201557    
## AcetylH3K9_N     2.113e-03  1.103e-02   0.192 0.848199    
## RRP1_N          -1.488e-02  1.038e-02  -1.434 0.152232    
## BAX_N           -1.304e-01  3.542e-02  -3.680 0.000260 ***
## ARC_N            1.687e-01  6.748e-02   2.499 0.012775 *  
## nNOS_N          -4.749e-03  2.854e-02  -0.166 0.867918    
## Tau_N            7.082e-02  1.810e-02   3.913 0.000104 ***
## GFAP_N          -4.703e-02  5.400e-02  -0.871 0.384291    
## GluR3_N         -7.133e-03  2.436e-02  -0.293 0.769812    
## GluR4_N         -7.030e-02  3.373e-02  -2.084 0.037672 *  
## IL1B_N           2.845e-02  1.081e-02   2.631 0.008794 ** 
## P3525_N          4.994e-02  2.423e-02   2.061 0.039810 *  
## pCASP9_N         8.118e-03  3.044e-03   2.667 0.007923 ** 
## PSD95_N          2.379e-02  3.811e-03   6.242 9.55e-10 ***
## SNCA_N          -1.137e-02  3.484e-02  -0.326 0.744195    
## Ubiquitin_N      2.330e-03  4.883e-03   0.477 0.633468    
## pGSK3B_Tyr216_N  2.808e-02  1.006e-02   2.793 0.005439 ** 
## SHH_N            4.904e-02  1.998e-02   2.454 0.014471 *  
## BAD_N           -3.592e-02  3.508e-02  -1.024 0.306311    
## BCL2_N           6.173e-03  3.112e-02   0.198 0.842840    
## pS6_N                   NA         NA      NA       NA    
## pCFOS_N         -1.232e-02  3.111e-02  -0.396 0.692240    
## SYP_N            1.354e-02  1.079e-02   1.254 0.210395    
## H3AcK18_N        2.401e-03  2.527e-02   0.095 0.924349    
## EGR1_N          -6.016e-03  2.619e-02  -0.230 0.818406    
## H3MeK4_N         1.227e-02  2.154e-02   0.570 0.569097    
## CaNA_N          -1.749e-03  3.419e-03  -0.512 0.609159    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.005885 on 476 degrees of freedom
##   (528 observations deleted due to missingness)
## Multiple R-squared:  0.8641, Adjusted R-squared:  0.8427 
## F-statistic: 40.37 on 75 and 476 DF,  p-value: < 2.2e-16

The variable pS6_N is ideally collinear with ERBB4_N, its impact cannot be estimated. Let’s exclude it from the model.

data_mice_num <- data_mice_num[,-71]
fit <- lm(ERBB4_N ~ ., data = data_mice_num)
summary(fit)
## 
## Call:
## lm(formula = ERBB4_N ~ ., data = data_mice_num)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.023037 -0.003370 -0.000127  0.003071  0.031955 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      2.566e-02  8.245e-03   3.112 0.001970 ** 
## DYRK1A_N        -6.824e-03  7.475e-03  -0.913 0.361710    
## ITSN1_N          8.127e-03  1.058e-02   0.768 0.442776    
## BDNF_N           3.833e-02  1.919e-02   1.997 0.046378 *  
## NR1_N           -1.409e-02  6.685e-03  -2.108 0.035544 *  
## NR2A_N          -1.049e-04  1.361e-03  -0.077 0.938610    
## pAKT_N           7.961e-02  3.044e-02   2.615 0.009203 ** 
## pBRAF_N         -4.103e-02  3.065e-02  -1.339 0.181356    
## pCAMKII_N       -1.451e-05  7.954e-04  -0.018 0.985451    
## pCREB_N         -6.015e-02  2.910e-02  -2.067 0.039290 *  
## pELK_N           4.731e-05  1.030e-03   0.046 0.963391    
## pERK_N          -1.987e-02  7.887e-03  -2.519 0.012109 *  
## pJNK_N          -6.035e-02  2.538e-02  -2.378 0.017814 *  
## PKCA_N           8.377e-03  2.736e-02   0.306 0.759625    
## pMEK_N           8.498e-03  2.706e-02   0.314 0.753676    
## pNR1_N          -2.566e-02  1.334e-02  -1.924 0.054968 .  
## pNR2A_N          1.004e-02  7.239e-03   1.388 0.165927    
## pNR2B_N          1.928e-02  6.659e-03   2.896 0.003959 ** 
## pPKCAB_N         5.727e-03  2.773e-03   2.065 0.039457 *  
## pRSK_N           1.001e-02  1.427e-02   0.701 0.483573    
## AKT_N           -6.738e-04  8.183e-03  -0.082 0.934413    
## BRAF_N           1.550e-02  1.213e-02   1.278 0.201926    
## CAMKII_N        -3.429e-03  2.011e-02  -0.170 0.864700    
## CREB_N          -1.450e-02  3.091e-02  -0.469 0.639135    
## ELK_N            3.510e-03  3.719e-03   0.944 0.345811    
## ERK_N            4.660e-03  2.804e-03   1.662 0.097208 .  
## GSK3B_N         -5.840e-03  6.824e-03  -0.856 0.392588    
## JNK_N           -1.188e-02  3.378e-02  -0.352 0.725317    
## MEK_N            8.785e-03  2.197e-02   0.400 0.689510    
## TRKA_N           5.722e-03  1.646e-02   0.348 0.728257    
## RSK_N           -2.299e-02  3.915e-02  -0.587 0.557214    
## APP_N           -5.345e-03  1.825e-02  -0.293 0.769751    
## Bcatenin_N       1.107e-03  5.298e-03   0.209 0.834546    
## SOD1_N          -6.241e-03  3.985e-03  -1.566 0.118018    
## MTOR_N           4.161e-02  1.651e-02   2.520 0.012048 *  
## P38_N           -1.619e-02  1.348e-02  -1.201 0.230267    
## pMTOR_N         -9.530e-03  7.667e-03  -1.243 0.214475    
## DSCR1_N         -6.849e-03  1.023e-02  -0.670 0.503374    
## AMPKA_N          2.522e-02  2.294e-02   1.099 0.272263    
## NR2B_N           7.391e-03  9.146e-03   0.808 0.419449    
## pNUMB_N         -6.467e-03  2.006e-02  -0.322 0.747326    
## RAPTOR_N         4.750e-02  2.465e-02   1.927 0.054629 .  
## TIAM1_N         -3.284e-02  1.956e-02  -1.679 0.093892 .  
## pP70S6_N         2.739e-03  5.307e-03   0.516 0.606022    
## NUMB_N          -3.304e-02  3.784e-02  -0.873 0.383042    
## P70S6_N         -4.357e-03  5.119e-03  -0.851 0.395118    
## pGSK3B_N         1.291e-01  4.063e-02   3.179 0.001577 ** 
## pPKCG_N         -7.636e-03  1.962e-03  -3.893 0.000113 ***
## CDK5_N           7.111e-04  9.807e-03   0.073 0.942225    
## S6_N             1.566e-02  7.647e-03   2.049 0.041059 *  
## ADARB1_N        -2.609e-03  2.040e-03  -1.279 0.201557    
## AcetylH3K9_N     2.113e-03  1.103e-02   0.192 0.848199    
## RRP1_N          -1.488e-02  1.038e-02  -1.434 0.152232    
## BAX_N           -1.304e-01  3.542e-02  -3.680 0.000260 ***
## ARC_N            1.687e-01  6.748e-02   2.499 0.012775 *  
## nNOS_N          -4.749e-03  2.854e-02  -0.166 0.867918    
## Tau_N            7.082e-02  1.810e-02   3.913 0.000104 ***
## GFAP_N          -4.703e-02  5.400e-02  -0.871 0.384291    
## GluR3_N         -7.133e-03  2.436e-02  -0.293 0.769812    
## GluR4_N         -7.030e-02  3.373e-02  -2.084 0.037672 *  
## IL1B_N           2.845e-02  1.081e-02   2.631 0.008794 ** 
## P3525_N          4.994e-02  2.423e-02   2.061 0.039810 *  
## pCASP9_N         8.118e-03  3.044e-03   2.667 0.007923 ** 
## PSD95_N          2.379e-02  3.811e-03   6.242 9.55e-10 ***
## SNCA_N          -1.137e-02  3.484e-02  -0.326 0.744195    
## Ubiquitin_N      2.330e-03  4.883e-03   0.477 0.633468    
## pGSK3B_Tyr216_N  2.808e-02  1.006e-02   2.793 0.005439 ** 
## SHH_N            4.904e-02  1.998e-02   2.454 0.014471 *  
## BAD_N           -3.592e-02  3.508e-02  -1.024 0.306311    
## BCL2_N           6.173e-03  3.112e-02   0.198 0.842840    
## pCFOS_N         -1.232e-02  3.111e-02  -0.396 0.692240    
## SYP_N            1.354e-02  1.079e-02   1.254 0.210395    
## H3AcK18_N        2.401e-03  2.527e-02   0.095 0.924349    
## EGR1_N          -6.016e-03  2.619e-02  -0.230 0.818406    
## H3MeK4_N         1.227e-02  2.154e-02   0.570 0.569097    
## CaNA_N          -1.749e-03  3.419e-03  -0.512 0.609159    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.005885 on 476 degrees of freedom
##   (528 observations deleted due to missingness)
## Multiple R-squared:  0.8641, Adjusted R-squared:  0.8427 
## F-statistic: 40.37 on 75 and 476 DF,  p-value: < 2.2e-16

Let’s diagnose the constructed linear model. Multicollinearity:

sqrt(vif(fit))
##        DYRK1A_N         ITSN1_N          BDNF_N           NR1_N          NR2A_N 
##        4.872023        7.711915        4.082843        9.813852        5.012776 
##          pAKT_N         pBRAF_N       pCAMKII_N         pCREB_N          pELK_N 
##        4.677621        3.109888        4.387688        4.173886        1.607783 
##          pERK_N          pJNK_N          PKCA_N          pMEK_N          pNR1_N 
##        7.465857        5.148647        6.217226        4.861493        6.661844 
##         pNR2A_N         pNR2B_N        pPKCAB_N          pRSK_N           AKT_N 
##        5.749259        7.471714        5.747702        3.989231        4.241897 
##          BRAF_N        CAMKII_N          CREB_N           ELK_N           ERK_N 
##        6.456818        4.007758        3.161083        5.463030        7.432996 
##         GSK3B_N           JNK_N           MEK_N          TRKA_N           RSK_N 
##        5.728862        4.358639        3.953392        8.513574        4.029508 
##           APP_N      Bcatenin_N          SOD1_N          MTOR_N           P38_N 
##        4.643652        9.591355        4.484240        4.226200        4.466973 
##         pMTOR_N         DSCR1_N         AMPKA_N          NR2B_N         pNUMB_N 
##        4.231519        3.816764        5.206541        3.319314        4.479676 
##        RAPTOR_N         TIAM1_N        pP70S6_N          NUMB_N         P70S6_N 
##        4.373510        4.668370        3.128047        4.845890        3.676441 
##        pGSK3B_N         pPKCG_N          CDK5_N            S6_N        ADARB1_N 
##        3.478913        4.695966        1.725275        4.183738        3.055476 
##    AcetylH3K9_N          RRP1_N           BAX_N           ARC_N          nNOS_N 
##        6.540512        1.310133        2.840968        3.824210        3.096720 
##           Tau_N          GFAP_N         GluR3_N         GluR4_N          IL1B_N 
##        4.222491        2.328700        3.273998        2.829308        3.094368 
##         P3525_N        pCASP9_N         PSD95_N          SNCA_N     Ubiquitin_N 
##        2.810073        3.013767        3.808100        3.048122        3.245402 
## pGSK3B_Tyr216_N           SHH_N           BAD_N          BCL2_N         pCFOS_N 
##        3.369730        2.258544        3.662395        2.912133        2.522871 
##           SYP_N       H3AcK18_N          EGR1_N        H3MeK4_N          CaNA_N 
##        3.070787        6.027909        3.739159        3.911425        4.442898

There is great collinearity between variables.

Resudials graph (Picture 6):

theme_set(theme_bw())
model_diag <- fortify(fit)
ggplot(data = model_diag, aes(x = .fitted, y = .stdresid)) + 
  geom_point() + 
  geom_hline(yintercept = 0) +
  geom_smooth() +
  geom_hline(yintercept = 2, color = "red") +
  geom_hline(yintercept = -2, color = "red")

Picture 6 The resudials graph.

There is no pattern.

Graph of Cook’s distances (Picture 7):

theme_set(theme_bw())
ggplot(mod_diag, aes(x = 1:nrow(mod_diag), y = .cooksd)) + geom_bar(stat = "identity") + ggtitle("Graph of Cook's distances")

Picture 7 Graph of Cook’s distances

There is no influential observations.

Normal distribution of residues (Picture 8):

qqPlot(model_diag$.stdresid, main = 'Graph Q-Q')

## [1]  37 389

Picture 8 Q-Q graph.

There is a deviation from the normal distribution.

Total, this is a bad model in all respects. It is necessary to exclude multicollinearity between variables.

Task4. PCA analisys.

Delete all NA values.

data_mice_num_na <- na.omit(data_mice_num)
mice_pca <- rda(data_mice_num_na, center = TRUE, scale = TRUE)
head(summary(mice_pca))
## 
## Call:
## rda(X = data_mice_num_na, scale = TRUE, center = TRUE) 
## 
## Partitioning of correlations:
##               Inertia Proportion
## Total              76          1
## Unconstrained      76          1
## 
## Eigenvalues, and their contribution to the correlations 
## 
## Importance of components:
##                           PC1     PC2     PC3     PC4     PC5     PC6    PC7
## Eigenvalue            22.8092 13.0296 7.58645 6.89765 3.28216 3.02896 2.3636
## Proportion Explained   0.3001  0.1714 0.09982 0.09076 0.04319 0.03985 0.0311
## Cumulative Proportion  0.3001  0.4716 0.57139 0.66214 0.70533 0.74519 0.7763
##                           PC8     PC9    PC10    PC11    PC12    PC13     PC14
## Eigenvalue            2.25184 1.80282 1.27298 1.10782 0.97653 0.84755 0.700418
## Proportion Explained  0.02963 0.02372 0.01675 0.01458 0.01285 0.01115 0.009216
## Cumulative Proportion 0.80591 0.82964 0.84639 0.86096 0.87381 0.88496 0.894179
##                           PC15     PC16     PC17     PC18     PC19     PC20
## Eigenvalue            0.597104 0.567936 0.520420 0.483436 0.414560 0.395137
## Proportion Explained  0.007857 0.007473 0.006848 0.006361 0.005455 0.005199
## Cumulative Proportion 0.902036 0.909509 0.916357 0.922718 0.928172 0.933371
##                           PC21     PC22     PC23     PC24     PC25     PC26
## Eigenvalue            0.376634 0.331066 0.318904 0.291320 0.256528 0.241724
## Proportion Explained  0.004956 0.004356 0.004196 0.003833 0.003375 0.003181
## Cumulative Proportion 0.938327 0.942683 0.946879 0.950713 0.954088 0.957268
##                           PC27     PC28   PC29     PC30     PC31     PC32
## Eigenvalue            0.215677 0.205354 0.1824 0.165090 0.150582 0.135747
## Proportion Explained  0.002838 0.002702 0.0024 0.002172 0.001981 0.001786
## Cumulative Proportion 0.960106 0.962808 0.9652 0.967381 0.969362 0.971149
##                           PC33     PC34     PC35     PC36     PC37     PC38
## Eigenvalue            0.131546 0.124803 0.121780 0.106629 0.100205 0.094696
## Proportion Explained  0.001731 0.001642 0.001602 0.001403 0.001318 0.001246
## Cumulative Proportion 0.972879 0.974522 0.976124 0.977527 0.978845 0.980091
##                          PC39     PC40     PC41      PC42      PC43     PC44
## Eigenvalue            0.09118 0.087324 0.080080 0.0755388 0.0725867 0.071899
## Proportion Explained  0.00120 0.001149 0.001054 0.0009939 0.0009551 0.000946
## Cumulative Proportion 0.98129 0.982440 0.983494 0.9844878 0.9854429 0.986389
##                            PC45      PC46      PC47      PC48      PC49
## Eigenvalue            0.0678445 0.0643947 0.0621993 0.0570225 0.0543054
## Proportion Explained  0.0008927 0.0008473 0.0008184 0.0007503 0.0007145
## Cumulative Proportion 0.9872816 0.9881289 0.9889473 0.9896976 0.9904121
##                            PC50      PC51      PC52      PC53      PC54
## Eigenvalue            0.0514765 0.0494509 0.0466105 0.0455651 0.0443141
## Proportion Explained  0.0006773 0.0006507 0.0006133 0.0005995 0.0005831
## Cumulative Proportion 0.9910895 0.9917401 0.9923534 0.9929530 0.9935360
##                            PC55      PC56      PC57      PC58      PC59
## Eigenvalue            0.0410658 0.0401352 0.0362324 0.0329131 0.0321704
## Proportion Explained  0.0005403 0.0005281 0.0004767 0.0004331 0.0004233
## Cumulative Proportion 0.9940764 0.9946045 0.9950812 0.9955143 0.9959376
##                            PC60      PC61    PC62      PC63      PC64      PC65
## Eigenvalue            0.0312727 0.0306336 0.02812 0.0268910 0.0238907 0.0219182
## Proportion Explained  0.0004115 0.0004031 0.00037 0.0003538 0.0003144 0.0002884
## Cumulative Proportion 0.9963491 0.9967521 0.99712 0.9974759 0.9977903 0.9980787
##                            PC66      PC67      PC68    PC69      PC70      PC71
## Eigenvalue            0.0198319 0.0187507 0.0181373 0.01672 0.0138757 0.0124744
## Proportion Explained  0.0002609 0.0002467 0.0002386 0.00022 0.0001826 0.0001641
## Cumulative Proportion 0.9983396 0.9985863 0.9988250 0.99904 0.9992276 0.9993917
##                           PC72      PC73      PC74      PC75      PC76
## Eigenvalue            0.011552 0.0110707 0.0088824 0.0086210 6.105e-03
## Proportion Explained  0.000152 0.0001457 0.0001169 0.0001134 8.034e-05
## Cumulative Proportion 0.999544 0.9996894 0.9998062 0.9999197 1.000e+00
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  14.30511 
## 
## 
## Species scores
## 
##              PC1     PC2      PC3     PC4      PC5      PC6
## DYRK1A_N -0.4340 -1.1005  0.16156 -0.5787  0.40956 -0.29336
## ITSN1_N  -0.7529 -1.1213  0.02545 -0.3885  0.31662 -0.43080
## BDNF_N   -1.4617 -0.2776  0.13861 -0.2754  0.07777  0.02370
## NR1_N    -1.4639 -0.2588  0.38581  0.2632 -0.13228  0.06977
## NR2A_N   -1.3261 -0.3478  0.56854  0.2773 -0.02636  0.03598
## pAKT_N   -1.0057  0.9344 -0.23091 -0.5486 -0.35948  0.03822
## ....                                                       
## 
## 
## Site scores (weighted sums of species scores)
## 
##          PC1     PC2       PC3    PC4    PC5    PC6
## 76   -0.7992 -0.5924  0.141176 0.5301 0.8206 0.3479
## 77   -0.7345 -0.5226  0.055906 0.5546 1.0258 0.4589
## 78   -0.8737 -0.5139  0.107993 0.4551 1.0148 0.3301
## 79   -0.3142 -0.5933  0.192511 0.2867 0.1504 0.5693
## 80   -0.3688 -0.4951  0.001643 0.3092 0.5272 0.6118
## 81   -0.3718 -0.4844 -0.015590 0.3715 0.4812 0.5097
## ....

Ordination plot using ggplot2 (Picture 9):

df_scores <- data.frame(data_mice_num_na,
                        scores(mice_pca, display = "sites", choices = c(1, 2, 3), scaling = "sites"))
data_mice_na <- na.omit(data_mice)
df_scores$class <- data_mice_na$class

p_scores <- ggplot(df_scores, aes(x = PC1, y = PC2)) + 
  geom_point(aes( color = class), alpha = 0.5) +
  coord_equal(xlim = c(-1.2, 1.2), ylim = c(-1.2, 1.2)) + ggtitle(label = "Ordination plot") + theme_bw()
p_scores

Picture 9 Ordination plot

Now we can build scree plot of the variance (i.e. sdev^2) for every principle component (Picture 10).

mice2_pca = princomp(data_mice_num_na, center = TRUE, scale = TRUE)
## Warning: In princomp.default(data_mice_num_na, center = TRUE, scale = TRUE) :
##  extra arguments 'center', 'scale' will be disregarded
screeplot(mice2_pca, type = "l", npcs = 7, xlab = "Components", ylab = "Variance", main = "Screeplot of the first 7 PCs")

Picture 10 Screeplot of the first 7 PCs.

Also we can build Cumulative variance plot (Picture 11).

plot(cumsum(mice2_pca$sdev^2 / sum(mice2_pca$sdev^2)), type= "b", xlab = "PCs", ylab = "Amount of explained variance", main = "Cumulative variance plot")
abline(v = 3, col="blue", lty=5)
abline(h = 0.995, col="blue", lty=5)

Picture 11 Cumulative variance plot for all PCs.

Contribution of each component (Picture 12):

res.pca <- prcomp(data_mice_num_na, center = T, scale = TRUE)
fviz_eig(res.pca)

Picture 12 Contribution of each component.

We notice that the first 4 explains almost 90% of variance. We can effectively reduce dimensionality from 76 to 4. We can select number of components as 4 (PC1 to PC4) and we can use these 4 components as predictor variables in new model.

data_mice_na <- na.omit(data_mice)
sc <- as.data.frame(mice2_pca$scores)
sc$class <- data_mice_na$class
all_data_mice <- cbind(data_mice_num_na, sc[,1:4])

lmodel <- lm(all_data_mice$ERBB4_N ~ Comp.1 + Comp.2 + Comp.3 + Comp.4 , data = all_data_mice)
summary(lmodel)
## 
## Call:
## lm(formula = all_data_mice$ERBB4_N ~ Comp.1 + Comp.2 + Comp.3 + 
##     Comp.4, data = all_data_mice)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.038289 -0.006572 -0.000746  0.006505  0.049965 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.1558863  0.0004773 326.596   <2e-16 ***
## Comp.1       0.0042775  0.0002968  14.410   <2e-16 ***
## Comp.2      -0.0001996  0.0003982  -0.501    0.616    
## Comp.3      -0.0015160  0.0006147  -2.466    0.014 *  
## Comp.4      -0.0144564  0.0010127 -14.275   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01121 on 547 degrees of freedom
## Multiple R-squared:  0.433,  Adjusted R-squared:  0.4289 
## F-statistic: 104.4 on 4 and 547 DF,  p-value: < 2.2e-16

When comparing the two constructed models, we can notice that the Adjusted R-squared coefficient of the second model is better than in the first one.

3D PCA plot (Picture 13):

fig <- plot_ly(sc, x =~Comp.1, y = ~Comp.2, z = ~Comp.3, color = ~class, size = 5)
fig <- fig %>% add_markers()
fig <- fig %>% layout(scene = list(xaxis = list(title = 'PCA1'),
                                   yaxis = list(title = 'PCA2'),
                                   zaxis = list(title = 'PCA3')))
fig

Picture 13 3D PCA plot (P.S. I don’t understand why it can’t be drawn here. But if call “fig” in console, it is work)